Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise

نویسندگان

چکیده

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When points reside Euclidean space, a widespread approach to form by Gaussian kernel with pairwise distances, and follow certain normalization (e.g., row-stochastic or its symmetric variant). We demonstrate that doubly stochastic zero main diagonal (i.e., no self-loops) robust heteroskedastic noise. That is, advantageous it automatically accounts for observations different noise variances. Specifically, we prove suitable high-dimensional setting where does not concentrate too much any particular direction resulting (doubly stochastic) noisy converges clean counterpart rate $m^{-1/2}$, $m$ ambient dimension. this result numerically show that, contrast, popular normalizations behave unfavorably under Furthermore, provide examples simulated experimental single-cell RNA sequence intrinsic heteroskedasticity, advantage exploratory analysis evident.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Doubly Stochastic Normalization for Spectral Clustering

In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in finding the closest doubly-stochastic matrix to the input affinity matrix. We then develop a scheme for finding the optimal, under Frobenius norm, doubly-stochastic ap...

متن کامل

Scalable Kernel Methods via Doubly Stochastic Gradients

The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called “doubly stochastic functional gradients”. Based on the fact that many kernel methods can be expressed as convex ...

متن کامل

Robust methods for heteroskedastic regression

Heteroskedastic regression data are modelled using a parameterized variance function. This procedure is robustified using a method with high breakdown point and high efficiency, which provides a direct link between observations and the weights used in model fitting. This feature is vital for the application, the analysis of international trade data from the European Union. Heteroskedasticity is...

متن کامل

Doubly robust multiple imputation using kernel-based techniques.

We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the i...

متن کامل

Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods

To address the scalability issue of kernel methods, random features are commonly used for kernel approximation (Rahimi and Recht, 2007). They map the input data to a randomized lowdimensional feature space and apply fast linear learning algorithms on it. However, to achieve high precision results, one might still need a large number of random features, which is infeasible in large-scale applica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SIAM journal on mathematics of data science

سال: 2021

ISSN: ['2577-0187']

DOI: https://doi.org/10.1137/20m1342124